Unmasking Outliers in Large Distributed Databases Using Cluster Based Approach: CluBSOLD

نویسندگان

  • A. Rama Satish
  • P. Bala
  • Krishna Prasad
  • D. Naga Raju
  • Ravi Kumar Saidala
چکیده

Outliers are dissimilar or inconsistent data objects with respect to the remaining data objects in the data set or which are far away from their cluster centroids. Detecting outliers in data is a very important concept in Knowledge Data Discovery process for finding hidden knowledge. The task of detecting the outliers has been studied in a large number of research areas like Financial Data Analysis, Large Distributed Systems, Biological Data Analysis, Data Mining, Scientific Applications, Health monitoring, etc., Existing research study of outlier detection shows that Density Based outlier detection techniques are robust. Identifying outliers in a distributed environment is not a simple task because processing with a distributed database raises two major issues. First one is rendering massive data which are generated from different databases. And the second is data integration, which may cause data security violation and sensitive information leakage. Handling distributed database is a difficult task. In this paper, we present a cluster based outliers detection to spot outliers in large and vibrant (updated dynamically) distributed database in which cell density based centralized detection is used to succeed in dealing with massive data rendering problem and data integration problem. Experiments are conducted on various datasets and the obtained results clearly shows the robustness of the proposed technique for finding outliers in large distributed database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis

Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...

متن کامل

Integrating and Mining Distributed Customer Databases

Large corporations often have different subunits sharing common customers, yielding distributed customer databases. Corporate risk and marketing functions seek areas where there is unusuaiiy high risk, or where one can target market. ‘We present a three-phase process to solve this problem. First, we merge the distributed databases using decision tree induction into a database of unique customer...

متن کامل

CONTRIBUTIONS TO PARALLEL AND DISTRIBUTED COMPUTING IN KNOWLEDGE DISCOVERY AND DATA MINING By

Recently databases are increasing continuously without bound, due to new data acquisition technologies. One challenge is how to gain knowledge from these large data sets. In this thesis, we analyze and improve the algorithmic solution of four problems related to knowledge discovery and data mining, making use of parallel computing; we also compare our results with related works. We design two p...

متن کامل

A Semantic Interpretation of Unusual Behaviors Extracted from Outliers of Moving Objects Trajectories

The increasing use of location-aware devices has led to generate a huge volume of data from satellite images and mobile sensors; these data can be classified into geographical data. And traces generated by objects moving on geographical territory, these traces are usually modeled as streams of spatiotemporal points called trajectories. Integrating trajectory sample points with geographical and ...

متن کامل

A robust least squares fuzzy regression model based on kernel function

In this paper, a new approach is presented to fit arobust fuzzy regression model based on some fuzzy quantities. Inthis approach, we first introduce a new distance between two fuzzynumbers using the kernel function, and then, based on the leastsquares method, the parameters of fuzzy regression model isestimated. The proposed approach has a suitable performance to<b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016